Document Validation System and Method

ABSTRACT

The present invention relates generally to the field of self-validating documents in supply chain management, documentation services and method for creating the same.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. application Ser.No. 11/614,811, filed Dec. 21, 2006, which is a nonprovisional of U.S.application Ser. Nos. 60/752,980, filed Dec. 21, 2005 and 60/755,897,filed Dec. 30, 2005, the entire disclosure of which are herein expresslyincorporated by reference.

BACKGROUND OF THE INVENTION

The present invention relates generally to the field of self-validatingdocuments in supply chain management, documentation services, method anddata processing system for creating the same.

Millions of documents are passed in global commerce between supplier andrecipient containing control statements within certification documents,such as for the safe use and handling of a product or its compliancewith applicable restrictions. Studies have shown a high rate of error insuch documents.

Global trade in products between a supplier and a customer depends uponthe control statements made in certification documents, such as MaterialSafety Data Sheets and Safety Data Sheets (MSDS or SDS), letters ofcertification or compliance certifications, because such controlstatements define the parameters of use of the product. For example, anMSDS for a hazardous substance or formulation, has become the commonmeans by which the supplier communicates to the customer the controlsnecessary for safe handling of the product as well as its compliancewith applicable restrictions whether in the U.S. at the federal or statelevel, or the requirements of another country or internationalconvention.

With regard to other types of products, a letter of certification orcompliance certification document, from the supplier of food andconsumer products contains control statements that communicaterequirements applicable to the use of the product. For instance, acertification document might accompany the supplier's shipment of a foodpackaging material to stipulate that the product could be used only withcertain types of foodstuffs under the requirements of the Food and DrugAdministration or similar governmental agencies of other countries. Suchcertifications may relate to regulations, standards, religious codes(e.g., keeping Kosher), scientific studies and the like. Millions ofsuch documents are generated and transmitted every year in manydifferent languages and countries for many different types of productsand uses.

In many cases such documents are a compilation of standard controlstatements defining various parameters of use of the product. It iscommon for such documents to be prepared and generated using a documentauthoring system or enterprise resource planning (ERP) system such asSAP from a phrase library that may have different language variants.

However, although the recipient of a generated certification documenthas the control statements for the product, he is not able to obtain orvalidate the source document supporting a control statement in anautomatic way. Nor can the recipient automatically determine whether achange relevant to a control statement in the received certificationdocument might have occurred from the time of the document's creation.

Moreover, the recipient may wish to use the product in a differentmarket or area of the world, and is unable to relate the controlstatements relevant in one jurisdiction to parameters of use in otherjurisdictions or areas. Independently of the supplier's certificationsthe recipient may also simply wish to review the control statements in acertification document to determine whether information is missing orfor which he requires additional information by reviewing the sourcedocument of such a statement. Finally, the recipient may wish to relatethe general information conveyed about the product in the receiveddocument to information about a specific shipment of that productreceived from the supplier where the shipment has, for example, aparticular RFID code. This last aspect is especially important where aproduct recall or alert has occurred for specific shipments of aproduct.

It is desirable, therefore, to provide a data processing system tosupport the automatic validation of control statements made aboutproducts flowing through the supply chain. Normally, validation of acontrol statement is done as a manual task by the recipient. Providing adata processing system for such information will improve the safety ofproducts in the supply chain, will improve the transparency of globalproduct requirements, will reduce cost of product approval, and willreduce mistakes.

It is also desirable for the customer to validate the control statementsof the supplier whenever possible through an automatic data processingsystem. Although the customer must legally rely on the statements of thesupplier, a prudent customer may wish to independently validate such acertification by looking up the reference to determine that it iscurrent or to assure himself or herself that important omissions havenot been made.

The communication of control statements is not simply a one-to-onerelationship between a supplier and a customer, but rather between amany-to-one relationship of multiple upstream suppliers in a supplychain with the customer. The customer may receive a certificationdocument with control statements that depend upon the specific claims ofan upstream manufacturer of raw materials used by the immediate supplierof the customer; however, the upstream manufacturer may be unwilling todisclose important source information to the immediate supplier withouta non-disclosure agreement, because of claims of confidentiality ortrade secrecy.

For example, a manufacturer of a plastic sells to a small converter thatproduces formed cups to a yogurt food processor. The small converter mayprovide certifications, but these depend on the materials used in theconversion process. Often it is not the certification statement itselfthat is confidential; rather it is the source document supporting thestatement that is confidential (e.g., test results or toxicologicalstudy). Thus, the yogurt food processor has a critical need to beassured of claims or compliance certifications that include both theimmediate supplier and the upstream raw material suppliers. The need ofthe customer is to validate control statements of the immediate supplieras well as—to the extent permitted by the upstream supplier and underterms agreed to by the customer—the control statements passed throughthe supply chain from upstream manufacturers that concern raw materialsor other conditions important to the immediate customer's use of thereceived product.

Many such certification documents transmitted by suppliers tocustomers—important though they are—contain omissions or errors. Indeed,according to a recent study of the completeness of safety data sheets:“The deficiencies for the different headings [that is, of the 16sections of a standard format MSDS] vary between twenty percent andforty percent”. ECLIPS: “European Classification and LabelingInspections of Preparations, including for Safety Data Sheets”, FinalReport 2004 published by the European Enforcement Network, page 11. Inconsequence, the control statements made in the safety data sheetsreviewed in the study have deficiencies that may include missing controlstatements, out-of-date control statements, or other errors. Further,according to the report, the error rates of regulatory statements insection fifteen of the MSDS, where required regulatory certificationstatements are made, averaged 35%. Ibid. Similar findings have resultedfrom Canadian studies. Welsh M. S.; Lamesse M.; Karpinski E. TheVerification of Hazardous Ingredients Disclosures in Selected MaterialSafety Data Sheets.” Applied Occupational and Environmental Hygiene,Volume 15, Number 5, 1 May 2000, pp. 409-420(12). OSHA has performedstudies of MSDS quality:

-   -   Based on the chemical ingredients identified, the accuracy in        the other four areas of concern was evaluated based on        information obtained from readily available reference sources.        The evaluation indicated that 37% of the MSDSs examined        accurately identified health effects data, 76% provided complete        and correct first aid procedures, 47% accurately identified        proper personal protective equipment, and 47% correctly noted        all relevant occupational exposure limits. Only 11% of the MSDSs        were accurate in all four information areas, but more (51%) were        judged accurate, or considered to include both accurate and        partially accurate information, than were judged inaccurate        (10%). (Found at the world wide web address        osha.gov/dsg/hazcom/finalmsdsreport.html).

Given the importance of such certification documents and the controlstatements that they contain to the safety of the recipient, means toimprove accuracy, as addressed in the present invention should beestablished. A number of studies agree: Error rates in suppliercertification statements are high.

In the area of food safety, FDA has established processes for review ofhazards: Hazard Analysis and Critical Control Point (HACCP).Nevertheless frequent reports appear where a food processor haspurchased a material that contains a contaminant not reviewedadequately.

The probability of error between supplier and customer increases withthe volume of certification documents and the number of suppliers. Inchemical-using industries, the number of raw materials for a singlemanufacturer can be thousands or tens of thousands and the number ofsuppliers in the hundreds or thousands. The same holds true in thefood-processing and food-related industries. As a result there is anessential need to improve methods of validation of supplier's statementsand to monitor important changes that may have occurred that relate tothe supplier's statements.

It is true that the supplier may have proprietary evidence to support acertification and may not have revealed the full composition of aformulation under restrictions on the disclosure of confidentialbusiness information, in which case an independent evaluation islimited. Nevertheless, the customer can perform many checks based on theinformation presented by the supplier, and may as a standard practiceadopt a review and validation of a supplier's certification statement.

Further, apart from any regulatory requirements, a number of industrieshave established their own internal standards that must be met in anyprocurement of raw materials by the company. For example, Volvo hasestablished: VOLVO Corporate Standard STD 1009, 11 (Established February1998) CHEMICAL SUBSTANCES WHOSE USE WITHIN THE VOLVO GROUP SHALL BELIMITED (VOLVO'S GREY LIST).

Such ad hoc customer procurement standards that are in addition to anymandatory governmental requirement and accepted only in the face ofmarket pressure have become widely accepted in part because of thedifficulties and high error rates in certification documents beingpassed in the supply chain between supplier and customer. In addition,these standards are subject to change without notice. Such ad hocstandards increase the cost of compliance and its complexity, andreflect the need for an improved method of producing, distributing, andvalidating certification documents in the supply chain.

A customer has several validation needs:

-   -   Accuracy and Currency. Has the supplier correctly cited a        supporting reference related to the safe handling of a product        and is it current?    -   Access to Source Documents. Can the customer obtain the cited        reference?    -   Access to Cited References. How can the customer obtain a cited        sub-reference within the cited document?    -   Completeness. Are there other related restrictions or references        that have been omitted or overlooked?    -   Global Scope. Are there similar restrictions in other countries        or languages?    -   Customer's Use vs. Supplier's Scope. Are there other        restrictions that apply to the customer's use in another market,        but which the supplier has not directly addressed in the        certification that are nonetheless critical to the customer        (e.g., the customer purchases a product in the U.S., and        receives a U.S. certification document but intends to use it as        a component or trans-ship it to another country)?    -   Change Management Regarding Supplier Statements. After a period        of time subsequent to the first receipt of the certification how        can the customer be informed if an important amendment or        modification has occurred related to a certification for the        product that the customer has purchased? Again, although many        regulations require the automated updating of MSDS or other        certifications in the event of a “significant” regulatory        change, many recipients seek to independently review supplier        information.    -   Change Management with Regard to Customer's Uses. After a period        of time subsequent to the first receipt of the certification how        can the customer be informed of other related changes of        interest but not provided by the supplier that may affect the        customer's use of the product, for example, in a country to        which transshipment occurs?    -   Upstream Supplier Certifications. Access to upstream supplier        certifications relating to the immediate supplier's product or        changes in these certifications under authorized terms and        conditions acceptable to the upstream supplier.

Today, suppliers and customers seek to establish checks within theirbusiness processes and to establish review systems within theirorganizations, but it is prone to error and oversight especially inlight of the complexity of global markets. The reason isstraightforward: These review systems are separated from thecertification document itself. The present invention provides a dataprocessing system to support automatic validation and addresses thisneed.

There are many ways in which suppliers generate such certificationdocuments either manually or by automated means within a system. Forexample, enterprise resource planning systems (ERP's) such as that ofSAP (e.g., SAP EH&S) assist suppliers in automatically generating MSDS.

The components of such systems often include:

-   -   A composition database containing products and detailed        composition and raw materials    -   Properties tables or databases containing associated values,        classifications, and restrictions applicable to substances and        properties. Such property tables may also include the automated        calculations from business rules;    -   Phrase libraries—sometimes with translations of phrases—that        contain control statements to be included in generated        documents;    -   Transaction control tables that include data that prevents or        alerts the potential shipment, purchase, import, export, or sale        of a product that may be forbidden;    -   Document databases that include the generated documents or other        documents that may be associated with a product, substance, or        process; and,    -   Business rule tables with conclusions (Left Hand Side—LHS)        actions that depend on criteria (Right Hand Side—RHS        parameters). For example, if benzene is a component in a        formulation greater than 0.1 percent used in the United States,        then insert the phrase code associated with the conclusion        “carcinogenic” into the properties table for this substance        identifier.

There are a number of current limitations in such systems:

-   -   ERP and document authoring systems as SAP EH&S, do not today        include a dynamic component, such as a hyperlink, in phrase        libraries of control statements used in the creation of        certification documents, one that permits the recipient to        validate a control statement within a received document in an        automatic manner;    -   ERP and document authoring systems do not provide for validation        of control statements through automatic means in generated        certification documents for products from within the generated        documents;    -   Although it is common for a manufacturer to hyperlink from a        product listing on a web-site page to a related MSDS or        technical document associated with the product, for example, it        does not exist that the control statements in the certification        document hyperlink to the authoritative source document for that        statement or data element.    -   Data processing systems do not exist to pass certification        documents containing dynamic control statements with hyperlinks        in business-to-business exchange of such between computer        systems in computer readable form so that the control statements        with hyperlinks can be extracted and placed in a database for        further use.    -   As a result, such data processing systems do not today allow the        generation of certification documents that permit automated        third-p arty validation and change management support services        in association with control statements made.    -   Such data processing systems do not use the loading and storage        of certification documents with control statements using        hyperlinks.    -   It is not possible to obtain direct access to upstream        manufacturer control statements or certification documents as        described through a central service and no general practice or        data processing system exists to provide this information.

One of the most difficult tasks of regulatory managers within supplierand customer organizations is keeping up with new or modifiedregulations or standards. Such compliance tracking tasks focus on theraw materials purchased, the substances manufactured, the processesthemselves, or the products sold or distributed. The regulatory managermay use enterprise systems, subscribe to publications, participate intrade organizations, or search the web for information about change.

Equally difficult is the task of determining or obtaining upstream rawmaterial certifications for products obtained from the immediatesupplier.

It is desirable, therefore, to provide the capability for such aregulatory manager to validate a dynamic control statement within acertification document by a hyperlink to the source document supportingthe control statement. In addition, it is desirable to provide thecapability for the recipient of a certification document to determine byclicking a hyperlink whether amendments, new requirements, ormodifications that pertain to a control statement have occurred for agiven period of time, for example, since the time that the certificationdocument was generated.

It is desirable, therefore, to provide a system by which a recipient'scomputer system can receive a certification document with its controlstatement from an upstream supply chain actors in such a manner that therecipient can store and re-use these control statements in authoring afurther certification document for a product where the parameters of useare dependent on the control statements of the upstream supplier.Further, the downstream recipient does not have a system by which he canvalidate the control statements of the upstream provider, if authorized.

There are many services where you can enroll to receive updates ofjournals, regulations with customized scope defined by the user. Suchservices include:

-   -   Westclip on Westlaw    -   ECLIPSE on Lexis/Nexis    -   U.S. Federal Register

However, the regulatory manager, researcher, or document recipient isinterested in changes that relate to the context of the certificationdocument and a particular control statement within it, which at presentmeans that the process of analyzing the control statements within acertification document is separate from and totally independent of theprocess of tracking changes. The complexity and discontinuity of thesetwo important processes—receiving the certification document anddetermining changes that relate to such a document's controlstatements—increase the probability of accidental non-compliance.

In addition to the data processing system for self-validation ofcertification documents, it is desirable for the researcher to obtainrelevant documents from a searching or indexing system that will returna compilation of documents that includes not only direct references tothe search term for a material but also a synonym, identifier,translation, or reference to a class or group containing the search termas a member. It is also desirable if the document reference from such asearch will return the document opened at the relevant page with theapplicable direct reference, synonym, identifier, translation, orreference to a class or group containing the search term as a member.Finally, it is desirable if the researcher can obtain a subset ofdocuments, for example, only those that have changed where the returnedreference is to a document containing not only a direct reference, butalso a synonym, identifier, translation, or reference to a class orgroup containing the search term as a member.

Publishers maintain large libraries of abstracts of knowledge in variousareas related to science and business, among other fields. One exampleis the ILLUMINA® system published by CSA and another is SCOPUS® byReed-Elsevier. Although such systems may contain links to the full-textdocuments associated with an abstract, they do not include either thesearch capabilities or validation system as described in this invention.

Referring now to FIG. 12, an example prior art search from Google®illustrates the need. In this instance, the researcher has searched fora material, specifically, a chemical, which is “crotonic acid”. Google®returns two thousand five hundred and twenty (2,520) documentreferences. Entering a synonym, “(E)-2-Butenoic acid” returns onlytwenty-two (22) document references. A Dutch synonym, “Crotonzuur”returns no hits and the message: “Try different keywords”. This searchillustrates both a searching display and a searching index that does notreturn document references that include a compilation of not only directreferences, but also synonyms or translations of a material term. Thisis a common approach of existing search displays and indexing methods,for example, Google®, SCOPUS®, ILLUMINA®, and others.

In searching for documents relevant to a material, the research is ofteninterested in documents that include a reference to a class of which thesearch term is a member. For example, if the user enters the term,“crotonic acid”, he or she may be interested in a document that refersto “Ungesättigte aliphatische Mono- and Dicarbonsäuren” because themeaning of this chemical class with many members includes the specificsubstance, crotonic acid. Similarly, if the user searches for “sodiumchromate”, the user would be interested in documents that include areference to “hexavalent chromium compounds”. Such indirect referencesto broad classes including the direct search term are not returned bythe example searches above of Google®.

The search term and interest in a reference to a broad group may notnecessarily be a chemical, but also a foodstuff, biologic, orformulation. A comprehensive search for the term, “orange”, according tothe present invention, should return a link to a document including areference to “citrus fruits, except lemon and limes”. Current searchsystems may return synonyms, (e.g., TOXNET) and may include relatedidentifiers and translations of substance names, but do not include asystematic cross-referencing system for such; nor do they include parentclasses within the context of the regulation or referenced document.

An identifier for a material is a particular type of synonym. Many suchidentification systems are used by regulatory or scientificorganizations, where an alphanumeric code represents a material. Forexample, the European Union uses EINECS numbers to refer to existingchemicals. FDA has its own system, as do the governments of Japan andKorea. Other systems, include color index numbers, etc. It is desirableto provide a system and method that spans any identifier returningdocuments that include a reference, whether that reference is a synonym,translation, parent group or class, or identifier in addition to anydirect reference.

Web-based search engines do not include such features whether in thesimultaneous display of document links containing references tosynonyms, translations, identifiers, and parent groups in addition todirect references or whether in methods used to index documents toextract references to such terms.

Current systems, including those noted above, do not:

-   -   Search with the scope and methods described above or in this        invention; nor    -   Return a document opened at the relevant page with the term        highlighted.

SUMMARY OF THE INVENTION

One embodiment of the present invention includes a system for validatingat least a portion of a certification document for at least onematerial. The system includes a certification document including atleast one dynamic control statement relating to and defining parametersof use for the at least one material, wherein the certification documentis accessible by at least one recipient, and, wherein the dynamiccontrol statement is validated by retrieving validation informationrelating to the dynamic control statement from a dynamic source ofvalidation information.

Another embodiment of the present invention includes a method forindexing documents in a data processing system, the documents includinga reference to at least one material, comprising: inputting a documentinto the data processing system; extracting at least one alphanumericstring from the document; determining relevant alphanumeric strings fromthe extracted alphanumeric strings by processing the extractedalphanumeric strings utilizing at least one algorithm by comparing, insequence and in combination, the extracted alphanumeric strings withmaterial terms in at least a dictionary database of common materialterms; matching the relevant alphanumeric strings with materialsalphanumeric strings stored in the data processing system; and storingthe matched alphanumeric strings in respective matched records in thedata processing system.

Another embodiment of the present invention provides methods forgenerating, distributing, validating, and searching documents aboutproducts that include standardized phrases that are claims made aboutthe compliance of the product with guidelines, standards, and laws orthat are properties of the product supported by a bibliographicreference to a literature reference.

Yet another embodiment of the invention provides for a database ofstandard phrases (hyperlinked standard phrase database) each with itsown with unique identification code, a text phrase that defines aspecific claim or statement, optional translated variants of the textphrase in a one-to-one relationship to the unique identification code,and a hyperlink to a server that can retrieve a document or translationsupporting the specific claim made by the standard phrase. In thismanner, the Hyperlinked Standard Phrase Database can be distributed andused by many parties in the supply chain so that any document generatedthat includes the phrase will have a standard meaning and any recipientof the document can validate the statement through clicking on thehyperlink.

Another embodiment of the invention provides for the HyperlinkedStandard Phrase Library Database including a hyperlink that will returnan index, compilation or reference to all changes in the sourcedocuments relating to the statement.

A further aspect of the method above is that it allows the author of thedocument to use a standard enterprise resource planning system (ERP),such as SAP, Oracle, or other document authoring system to includestandard phrases from the phrase database as well as his or her ownphrases in a flexible manner so that the document can contain acombination of phrases that are standard, other phrases from upstreamsuppliers of raw materials, and phrases inserted by the document'sauthor. The standard phrases can be selected based on need by thedocument's author in a completely free and flexible manner. What is newis that the standard phrase will have a hyperlink to a document orfunction retrieving the relevant text. In addition, the standardizationof the hyperlinks permits the recipient of a document in any form thataccepts hyperlinks to retrieve source documents associated with a claimfrom a centralized service that can route and retrieve hyperlinkedsource documents from any server wherever located.

A further aspect of the method above is that it allows for a system ofuser authorization associated with standard phrases contained ingenerated product documents. In this manner, an upstream supplier canpass to his direct customer a statement with a hyperlink to a secureserver open only to authorized users. Let us suppose that the statementis public but the source documents supporting the claim areconfidential. Thus, the customer can include the statement in anauthored document and distribute the document to his downstreamcustomers, but the control to open access to the confidential supportingdocument is controlled by the provider of the standard phrase.

Another embodiment of the present invention also relates to a method forindexing documents including a description of at least one material in adata processing system. The method includes inputting a document intothe data processing system; extracting at least one alphanumeric stringfrom the document; determining relevant alphanumeric strings from theextracted alphanumeric strings by processing the extracted alphanumericstring utilizing at least one algorithm by comparing, in sequence and incombination, the extracted alphanumeric strings with material terms inat least a dictionary database of common material terms; matching therelevant alphanumeric strings with materials alphanumeric strings storedin the data processing system; and storing the matched alphanumericstrings in respective matched records in the data processing system.

Another embodiment of the present invention provides methods forsearching document management systems and an improved means ofefficiently locating, searching, and categorizing documents stored onInternet web-sites to enable identification and cross-referencing ofreferences to dangerous chemicals within documents in any language.

One aspect of the present invention provides for post-processing ofidentified documents to permit a document to be opened at a relevantpage with the term of interest highlighted, and in the integration offound documents into a validation system supporting certificationdocuments transmitted between supplier and recipient.

Another aspect of the present invention provides a validation that canbe established from within the document itself in an analogous manner tothe checks that occur in giving a credit card to a merchant in order toeffect a purchase. A third-party service supports the security of thetransaction between merchant and customer that reduces fraud andimproves the efficient functioning of markets—based on the credit carditself.

Another aspect of the present invention provides a review system andrelated validation technologies based on the certification documentitself.

One aspect of the present invention provides for a system ofself-validating documents with independent validation support services.Another aspect of the present invention provides a self-validatingcertification documents passed between supplier and recipient based onstandard phrases to be included in such documents together withvalidation hyperlinks that invoke a series of services, including: a)the retrieval of a cited document opened at a referenced page with ahighlighted section associated with a material, material class, topic,use, or legal citation; b) the retrieval of summary reports of allrequirements related to the standard certification phrase; c) theretrieval of all amendments, additions, and deletions of requirementsrelated to the certification phrase; d) the retrieval of relatedproperty data records that may be automatically loaded into a documentmanagement or enterprise resource planning system; and e) the retrievalof transaction control alerts. One embodiment of the present inventionprovides for the generation indexing, extraction, and formation ofdocuments containing validation links

Another aspect of the present invention provides a system ofself-validating documents through which a downstream recipient of acertification document can assure himself of upstream certificationsrelated to the submission of the immediate supplier in a confidentialmanner acceptable to an upstream supplier. Another aspect of the presentinvention provides self-validating certification documents passed to thevalidation service by the upstream supplier, including a standardphrase, a validation hyperlink to the source document, and anauthorization procedure. If the downstream user accepts or meets theconditions of the authorization procedure, a series of services are madeavailable that relate to the upstream certification in the context ofthe immediate supplier's product or use, including: a) the retrieval ofa cited upstream supplier document opened at a referenced page with ahighlighted section associated with a material, material class, topic,use, or legal citation; b) the retrieval of summary reports of allrequirements related to the standard certification phrase, including thesupplier's confidential certification statements; c) the retrieval ofall amendments, additions, and deletions of requirements related to thecertification phrase, including the upstream supplier's confidentialcertifications; d) the retrieval of related property data records thatmay be automatically loaded into a document management or enterpriseresource planning system, including upstream supplier data; and e) theretrieval of transaction control alerts, including transaction controlalerts that relate to the upstream supplier's certifications.

Another aspect of the present invention provides for the generationindexing, extraction, and formation of documents containing validationlinks.

Yet another aspect of the present invention provides that the user tosearch for a standard synonym and return a highlighted reference to aliteral reference on the page of a document, which provides thecapability to link proper synonyms to literal names found within thetext of a document that may not be “acceptable”, as well as the abilityto link in synonyms based on confidential upstream supplier references.

Another aspect of the present invention provides linking not only tosets of chemical substances but to any “material” that may not bechemicals in the proper sense at all that may be biological agents,products, or concepts (‘sweeteners’), including but not limited toconfidential upstream supplier materials.

Another aspect of the present invention provides a system ofself-validating documents including direct submissions between asupplier and a recipient as well as multiple party submissions through achain of supplier-user relationships.

An aspect of the invention provides for supporting the validation systemin its method to search and index documents in order to extractreferences to materials, material classes, and legal citations.

An aspect of the invention provides for supporting the validation systemthrough access to upstream supplier certifications or certificationdocuments through an authorization system.

Additional objects and advantages of the present invention will beapparent in the following detailed description read in conjunction withthe accompanying drawing figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system for validating certificationdocuments according to one embodiment of the present invention.

FIG. 1A is a block diagram of a system for validating certificationdocuments according to one embodiment of the present invention.

FIG. 2 is a block diagram of an alternative embodiment of the system ofthe present invention.

FIG. 3 is a chart illustrating an example of a hyperlinked standardphrase library of the system of one embodiment of the present invention.

FIG. 4 is a diagram of a certification document including a dynamiccontrol statement according to one embodiment of the present invention.

FIG. 5A is a block diagram of a data processing system according to oneembodiment of the present invention.

FIGS. 5B and 5C are flowcharts of a method for indexing documents in adata processing system according to one embodiment of the presentinvention.

FIG. 6 is a diagram of a database produced by the method for indexingdocuments of one embodiment of the present invention.

FIG. 7 is a diagram of a result of a search for a chemical termaccording to one embodiment of the present invention.

FIG. 8 is a diagram showing an alphanumeric string in a document to beindexed utilizing the method of one embodiment of the present invention.

FIG. 9 is a chart illustrating a materials database of one embodiment ofthe present invention; and

FIG. 10A is a block diagram of an alternative embodiment of a dataprocessing system described herein.

FIGS. 10B and 10C are flowcharts of an alternative embodiment of amethod for indexing documents in a data processing system describedherein.

FIG. 11 is a diagram showing a materials database of one embodiment ofthe present invention.

FIG. 12 is a diagram showing results of a search for a materials termaccording to the prior art.

DETAILED DESCRIPTION OF THE INVENTION

“certification document”—As used in this invention, a certificationdocument comprises: A purchase order, advanced shipment notice, shippingdocument, material safety data sheet, compliance certificationstatement, customer procurement standard, compliance statement,technical dossier, label, guideline, legislation, regulation, andstandard as well as any document, submission, or compilation requiredfor REACH (Registration, Evaluation, Authorization and Restriction ofChemicals under any related requirements).

“control statement”—As used in this invention, a control statement is aphrase intended to be included in a document to communicate theparameters of use of a material.

“authoring system”—As used in this invention, an authoring system is asoftware application employed for one of the preparation, generation,and distribution of a certification document.

“dynamic control statement”—As used in this invention, a dynamic controlstatement is a control statement that includes a function that can beexecuted by the recipient to retrieve one of a document, documentreference, document link, compilation, summary, data, and function toperform the same. An example of a function that is included in a dynamiccontrol statement is a hyperlink. An illustration of such a dynamiccontrol statement might be “This product complies with EU Directive2002/72/EC” found on the worldwide web atdecernis.com/reference/document/2002_(—)72_en.pdf.

“parameter of use”—As used in this invention, a parameter of use is arestriction, limitation, approval, guideline, standard, practice,recommendation, characteristic, behavior, measure, and data for amaterial.

“recipient”—As used in this invention, a recipient is a human being anda computer system.

“validation information”—As used in this invention, validationinformation is a dynamic control statement, source document,compilation, summary, alert, recall, and reference supporting saiddynamic control statement.

“data processing system”—As used in this invention, a data processingsystem is a one or more programmable electronic devices that can store,retrieve, and process data.

“logical function”—As used in this invention, a logical function is abusiness rule, programmable computer routine that performs a calculationwith variables and returns a result. For example, a business ruleexpressed descriptively is “if the concentration of a component of amixture is above 0.1% and that component is a listed carcinogen, theninsert the control statement database key which is associated with thephrase ‘Carcinogenic’.” An example of a logical function expresseddescriptively is “if the web server receives a request from a dynamiccontrol statement for a variable associated with the retrieval of adocument from the server's document database, then a check should beperformed of the user's authorization in the authorization database”.

“material”—As used in this invention, a material means a chemical,formulation, biological product, any virus, therapeutic serum, toxin,antitoxin, or analogous product, and finished article. Examples ofmaterials include but are not limited to formaldehyde, perfume,compounds, Irganox, processed foods, and serums. Examples of a finishedarticle includes a toy.

“document”—As used in this invention, a document means a computer filewhether as a whole or deconstructed into component parts for electronictransmission and communication.

“alphanumeric string”—As used in this invention, an alphanumeric stringmeans a sequence of computer codes representing letters, numbers, andcontrol characters, such as a line ending and punctuation mark.

“algorithm”—As used in this invention, an algorithm is a computerprocedure that begins in an initial state and terminates in a definiteend state, applied here to process alphanumeric strings to prepare for,compare, determine the relevance of, and terminate indexing andsearching related to a material term. For example, an algorithm toprepare an individual alphanumeric string for matching with a materialterm, is to strip all punctuation codes, raise all letters to capital,and to store the result in temporary memory. Another example of analgorithm is to compare any given alphanumeric string after processingwith a database of material terms similarly processed and sorted inorder of longest terms first.

“sequence”—As used in this invention, a sequence is one or morealphanumeric strings extracted from a document in a defined order. Forexample, the order of extraction includes but is not limited to thepresentation of columns within a document and processing in order withina column. Another example is that the order of extraction ofalphanumeric strings should follow the natural order of the language,i.e., from left-to-right in English.

“dictionary database”—As used in this invention, a dictionary databaseis a collection or records stored systematically in an electronic mediumso that it may be queried.

“common words”—As used in this invention, common words means adictionary of terms that are ignored as noise in indexing.

“matching”—As used in this invention, matching is a procedure thatterminates in accepting or rejecting an alphanumeric string extractedfrom a document to determine whether it is identical to a termdescribing a material stored in a database.

An embodiment of the invention is a data processing system that improvesthe capability of a recipient of a certification document to validatedynamic control statements made in the document to define the parametersof use of a product by including within the document hyperlinks thatretrieve the source document or reference supporting the given dynamiccontrol statement from a web server (see FIG. 1). This embodiment hasthe effect of communicating data about the parameters of use of aproduct to improve product safety and compliance.

Referring now to FIGS. 1 and 1A, a data processing system usinghyperlinked dynamic control statements in a certification document isillustrated according to one embodiment of the present invention. Acertification document 152 is generated by an author from a clientcomputer 100 via a document authoring application 101 that resides on adocument authoring server 151 for a product sent to a recipient 153.Dynamic control statements for a product retrieve, e.g., by hyperlink,an authoritative source document 102 and are included in an electroniccertification document 104 for electronic transmission to the recipient153 via a network. However, a recipient may also receive the electronicdocument via computer readable medium, such as XML or other electronicdocument exchange or by posting or sending a link to a secure webserver, discussed in more detail below, to the recipient 153. Therecipient 153 validates a dynamic control statement, for example, byclicking on a hyperlink in the electronic certification document 104that returns from a validation server 154 at least one of the followingsources of validation information: (a) a source document 107 supportingthe dynamic control statement; (b) a list of change(s) 109 relevant tothe dynamic control statement; (c) a compilation 108 of referencesrelated to the dynamic control statement, and (d) alerts 110. Thevalidation server 154 is preferably a dynamic source of validationinformation, as the information is preferably updated at least atpredetermined intervals.

The request 113 is passed to the web-server 106, which may be in anumber of different forms, including but not limited to HTTP, SOAP,remote object function calls, web services protocols, and the like. Theweb-server 106 passes the request to the application server 111 thatreturns a response to the request 114.

A hyperlinked standard phrase library 102 includes for each dynamiccontrol statement at least one of the following and preferably includinga hyperlink, hyperlink fragment, or other variable that serves to invokea function: a unique identification code, a phrase identification code,a language code, a text string, and a hyperlink or hyperlink fragmentstored in a database. In a preferred embodiment each dynamic controlstatement may have translated variants associated with dynamiccomponents, e.g., hyperlinks, that will return from a server the sourcedocument supporting the statement. In another preferred embodiment thehyperlink may be fully formed or a unique fragment associated with theunique phrase code which when appended to a base URL address where theserver application has been loaded will return from a server the sourcedocument supporting the statement.

A dynamic control statement with a hyperlink in the present inventioncan be included in a certification document, advantageously to providingthe recipient 153 the capability to validate it.

The hyperlinked standard phrase library 102 is preferably integratedinto a document authoring application 101, for example an enterpriseresource planning system, such as SAP R/3, SAP EH&S, product life cyclemanagement system, or material safety data sheet generation system. Theauthor of the document 100 to be prepared for a material stored in theProduct Composition Database 103 selects unique dynamic controlstatements with the hyperlinks that represent parameters of use to beincluded in the certification document for a given product. For example,a food contact certification document for a particular grade ofpolyethylene is being manufactured and an author selects the phrase“Kunstoffverordnung Nr. 476/2003” (with its associated dynamiccomponent, e.g., hyperlink) to indicate the compliance of themanufactured product with an applicable food contact regulation inAustria. The author electronically places into a certification documentthe phrase with its hyperlink to the authoritative document so that therecipient can validate the dynamic control statement independently.

The generated certification document 152 may be in any form that acceptsa hyperlink or logical function to retrieve validation information for adynamic control statement 104, including HTML, RTF, Microsoft Word,Excel, Adobe PDF, or in a structured data format such as XML but notlimited thereto.

The document received by the recipient 153 includes dynamic controlstatements with a hyperlink or logical function to retrieve the sourcedocument accessible from a validation server 154. The recipient of thecertification document who wishes to examine the authority for aparticular dynamic control statement may request 113 the source documentfrom the hyperlink through the validation server. The web server 106receives the request that is processed by the validation application 111to retrieve the requested data from the database 112. The recipient isdirected to the source document 114 through the web server 106.

FIG. 3 illustrates one example of an embodiment of the present inventionfor a Hyperlinked Standard Phrase Library of dynamic control statementswith hyperlinks, providing a more detailed illustration of 102 inFIG. 1. In this example of a preferred embodiment, a uniqueidentification code 300 groups dynamic control statements that have beentranslated into different languages so that the document authoringapplication 101 can produce a certification document 152 in FIG. 1 inany of the languages available in the Hyperlinked Standard PhraseLibrary. The Phrase Code 301 identifies the specific dynamic controlstatement in the database. The Language Code 302 defines the language ofthe text phrase, i.e., the dynamic control statement, 303. In addition,the hyperlink or hyperlink fragment 304 references the source documentor documents supporting the specific dynamic control statement in anyavailable language. The address of the hyperlink can refer to any one ofthe following to produce the results 107, 108, 109, or 110 for a dynamiccontrol statement: A specific document, compilation, or summary in thedocument database 112 of FIG. 1, an index pointing to a document,compilation, or summary, or an argument of a function that will retrieveor generate a requested document, compilation, or summary.

Referring now to FIG. 2 a data processing system using hyperlinkeddynamic control statements in a certification document 252 isillustrated according to one embodiment of the present invention inwhich the certification document 252 is transmitted in computer readableform. In this embodiment of the present invention the certificationdocument 252 is produced in a computer readable form, including XML orother electronic document exchange formats, in such a manner that adynamic control statement 204 can be received by the recipient'senterprise system 253 via the receiving computer 215. The dynamiccontrol statement can then be extracted and stored in the recipient'sdatabase 216.

Again referring to FIG. 2, in this embodiment of the present inventionthe recipient can produce electronic reports from the database 216 inthe enterprise system 205 that include dynamic control statementsreceived in certification documents for a product. The recipient usingthe enterprise system 253 can validate such dynamic control statementsby clicking on a hyperlink in any such generated recipient electronicreport will produce a request 213 that will return (214) from thevalidation server 254 at least one of the following sources ofvalidation information: (a) a source document 207 supporting the dynamiccontrol statement; (b) changes 209 relevant to the dynamic controlstatement; (c) a compilation 208 of references related to the dynamiccontrol statement, and (d) alerts 210. The validation server 154 ispreferably a dynamic source of validation information, as the validationinformation is preferably updated at least at predetermined intervals.

In a further embodiment of the present invention, the recipient canimplement a document authoring system as in FIG. 1 and load the receiveddynamic control statements with hyperlinks or logical functions into therecipient's equivalent Hyperlinked Standard Phrase Library as in 102 ofFIG. 1, thus permitting the reuse of dynamic control statements in aplurality of document authoring systems for any dynamic controlstatement passed through the supply chain. In an embodiment, therecipient's Hyperlinked Standard Phrase Library includes at least one ofthe following with hyperlinks: standard dynamic control statements, therecipient's own dynamic control statements, and received dynamic controlstatements. The recipient who has implemented a data processing systemof the present invention can generate certification documents includingdynamic control statements passed to the recipient.

One embodiment of the present invention provides a certificationdocument with such hyperlinked dynamic control statements can be passedto a recipient in a computer readable medium so that the recipient canthen include a received hyperlinked dynamic control statement inauthoring a further certification document. In consequence, theembodiment provides a data processing system to improve the means ofpassing information in a supply chain to control the safe use andcompliance of products in a standard manner and in a manner that permitsthe validation of a dynamic control statement included in acertification document being transmitted in a supply chain.

The present invention provides an improved capability to prepare anddistribute hyperlinked standard phrase libraries that are industry,subject-specific, or within a supply chain with hyperlinks andoptionally translation variants so that a manufacturer may prepare acertification document in a flexible way with any selection ofapplicable hyperlinked dynamic control statements according to theconclusions of the expert author of the certification document and thusto permit the recipient to evaluate the stated parameters of use of theproduct by reviewing the source document supporting each dynamic controlstatement made.

Further, one embodiment of the present invention improves upon currentpractice by providing a method by which the author of a certificationdocument may add locally authored dynamic control statements to thehyperlinked standard phrase library or to insert in a certificationdocument hyperlinked dynamic control statements received from upstreamsuppliers of raw materials, which improves the capability for upstreamsuppliers to communicate parameters of safe and compliant use ofmaterials through certification documents that includes a dataprocessing system that can be implemented by both direct recipients of acertification document as well as downstream recipients of a dynamiccontrol statement associated with the raw materials of a value-addedproduce to validate the source documents or references associated withany given dynamic control statement.

Through the availability of translated variants, documents generatedwith a hyperlinked dynamic control statement may be in any language.Such translated variants may be required where the use of the documentis in a country with more than one national language, for example,Belgium, Canada, or Switzerland.

A further embodiment of the invention improves upon the capability ofthe recipient of a certification document 152 to determine changes inassociated source documents relevant to an included dynamic controlstatement 104 for a given period of time. Referring to FIG. 1 and FIG.3, the recipient of a certification document 152 can request 113 changes109 relevant to a dynamic control statement 104 in a certificationdocument 152 for a given period of time. In an embodiment the argumentof the hyperlink of a dynamic control statement can include a date inorder to request (113) changes (109 in FIGS. 1 and 3) relevant to thesource documents supporting a given dynamic control statement 104. Forexample, the argument of a hyperlink may include the date of thecertification document's 152 generation in order that the recipient canbe informed of amendments to a regulation cited by the dynamic controlstatement.

The invention provides a data processing system that improves upon oneof the most difficult tasks for a recipient of a document in the supplychain, which is to determine whether important amendments ormodifications have come into force, or further studies or standards,relevant to a specific claim made in a dynamic control statement. Atpresent, this task is disadvantageously performed manually by therecipient. For example, if the text of the dynamic control statement is,“This product complies with EU directive 2002/72/EC for plasticmaterials in contact with foodstuffs” then the hyperlink request 113 tothe validation server 106, 109 can return 114 references to allsuccessive amendments to this directive since the time of thecertification document's 152 generation. The invention thus provides animproved capability to provide a data processing system for standardizedchange management alerts for dynamic control statement 104 claims madein certification documents 152 to support the safe use of a product.

A further embodiment of the present invention is a data processingsystem to provide for product recalls and product alerts for specificshipments. In an embodiment of the present invention, an RFIDidentification number is associated with the certification document 152or 252 or to one or more dynamic control statements 104 or 204 withhyperlinks or logical functions in order to return from the validationserver 254 a product recall, alert, or other information applicable to aspecific shipment.

The present invention improves the capability for a data processingsystem to provide alerts regarding product recalls 110, 210 or otherinformation as described above from a validation server 106, 206 notonly in relation to the product in general but to a specific shipment ofthat product identified by RFID identification code, such informationregarding a product recall or alert being automatically provided inresponse to a request 113, 213 either from a human being 153 or from acomputer system 253, which provides for the advanced shipment notice orother commercial business-to-business interaction in electronic documentexchange format or in XML format to include a reference to a documentgenerated with the Hyperlinked Standard Phrase Library and to an RFIDidentification of a specific shipment.

The described embodiments of the present invention improve thecapability of the recipient to review MSDS and similar compliancecertification documents, which otherwise must be carried out manually.The data processing system of the present invention further supportsreview of compliance and review of supplier certifications forfoodstuffs, medical devices, pharmaceuticals, etc. The present inventionis an improvement in that it is based on the document itself and not onan adjunct manual review process. An aspect of the present invention isto provide supply chain actors with a validation service bureau wheresuch actors may generate may connect generate regular documents,regulatory compliance document preparation with the source documentvalidation management that the invention provides.

A certification document or safety document is self validating uponreview by the recipient. Currently a recipient must contact the supplierdirectly to validate documents.

Another embodiment of the present invention applicable to certificationdocuments is that the document opened by the hyperlink or logicalfunction of the invention may open a document, may open a document at acited page, or may open a document at a cited page with a relevantsection of the page or document highlighted. In such a form, thehyperlink or logical function contains an argument that includes thepage(s) number of the cited document and the sections of those pages tohighlight with the coordinates describing areas of the page tohighlight. One example is to specify coordinates of the highlighted pageby defining one or more rectangular areas of the page to highlightidentified by the x,y positions of the axes of the rectangle that can bepositioned on the space of a page. Other highlighted shapes orcoordinate systems may be provided for within this invention. What isimproved is the capability for the user of a certification document toclick on the hyperlink or invoke a logical function so that the sourcedocument which will be returned from the validation server opened at therelevant page associated with the statement in the document with thehighlighted relevant section of the page(s) of the source document.

A further embodiment of present invention provides access to a dataprocessing system which is controlled by the level of service agreed toin a service agreement. In this aspect of the invention the hyperlinkcontains an additional argument which is an authorization code tocontrol users authorized to view or download the source documentreferenced by the statement in the received document, which improves thegeneral capability for upstream supply chain actors to provide documentscontaining statements referencing confidential information tointermediate and downstream actors in the supply chain. The downstreamuser can pass along the statement with the hyperlink or logical functionto customers who may rely on this statement and be granted access basedupon the privileges granted by the owner of the information hyperlinkedwith the statement, which improves the potential for a service bureau tomanage authorization privileges for statements included in a HyperlinkedStandard Phrase Library. In this aspect of the invention it is notnecessary for the service bureau to maintain the confidential documentsthemselves, only references to them with the reviewed authorizationprivileges.

To provide an example of the type of certification document depicted at152, 252 FIG. 4 shows a certification document for a Polyethyleneproduct containing dynamic control statements that the product meetsrequired approvals for its use in several countries. Each dynamiccontrol statement contains a hyperlink of the type defined in theinvention that permits the customer to receive a document that canperform one or more of the following:

-   -   hyperlink to the text of the applicable cited regulation,        optionally opened to the relevant page with a highlight of a        section of the page.    -   hyperlink to a validation server that will return all regulatory        changes for the dynamic control statement since the date of the        document's generation for the context of the letter for which        the behavior of the returned format or information can be        customized; this includes the capability for authorized users to        obtain from a validation server the changes to the        certifications of an upstream supplier;    -   hyperlink to a validation server to return a summary report or        compilation of other relevant requirements or restrictions of        interest to the customer; This includes the capability for        authorized users to obtain from a validation server a summary        report including the certifications of an upstream supplier;    -   hyperlink to a validation server to return a transaction control        alert, such as “forbidden in transport by air”;    -   hyperlink to a validation server to return data in a structured        format.

In the example, 400 illustrates a dynamic control statement for theUnited States (401) while 402 illustrates a dynamic control statementfor Sweden. Dynamic control statement 400 hyperlinks to document 403with a preferred embodiment in which the section is highlighted 405

400 illustrates a claim and instruction by the supplier for the safe useof the product, i.e., a dynamic control statement, that the productcomplies with 21 CFR 177.1520 (a)(3) (2003). In the simplest case, theuser may be interested in an immediate review of the relevant sourceregulation opened at the page with the relevant page with the sectionhighlighted that relates to FDA regulation of “Olefin polymers” (asillustrated), a family of plastics that subsumes the specific product,Polyethylene. To produce this result, the supplier receives aHyperlinked Standard Phrase Library (FIG. 1, 102) that containsidentifiers, a phrase code, phrase texts in optionally differentlanguages, and a validation hyperlink, as illustrated in FIG. 3. In thisexample embodiment the dynamic control statement is:

“FDA, CFR, Title 21 (2003), 177.1520 (a)(3)(i)(c)(1), (b) and (c) 3. la.Olefin Polymers.”

The validation hyperlink for that phrase code is:

//decernis.com/reference/navpdf2.jsp?timestarnp-6_(—)52003&profile=1155&doc=2158789.pdf&pg=3&11x=156&11y=173&rux=196&ruy=183&:lib=document in whichthe validation hyperlink or logical function may contain one or morecomponents, such as:

URL [http://decernis.com/reference]

Target function [navpdf.jsp]

Timestamp [6_(—)5_(—)2003]

Source

Identifier

Profile [1155]

Topic

Material

Document [2158789.pdf, page 3, and the document itself is to beretrieved]

XY positions [lower left x position (11x) at 156, lower left y position(11y) at 173, right upper x (rux) position at 196, right upper yposition (ruy) at 183]

-   -   In consequence, the validation hyperlink or logical function can        alternatively contain, for example, systematic information that        can be associated with certification phrases in the document.

In this manner, a database of hyperlinked dynamic control statements asa component of the data processing system of the present invention canbe distributed to many different suppliers permitting consistency andvalidation by recipients in the communication of the parameters of useof a product, while allowing for significant customization to meet theneeds of a supplier. The validation database can include dynamic controlstatements of an upstream supplier that include a hyperlink to thesource document contained in a database available only to authorizedusers. In this manner, the author of a document can assemblecertification statements in a standardized as well as customizablemanner that include both the immediate user's claims as well as upstreamsupplier claims although the upstream supplier claims would only beaccessible to an authorized user. As a result certifications can bepassed from multiple parties upstream in the supply chain, simplifyinggreatly the assessment task of an downstream user

The supplier can use his or her document authoring environment to embedthe appropriate phrase codes within a defined report template for agiven product. The invention is providing a component that can be usedin ERP or document authoring systems.

Once the document authoring step has been completed the supplierdistributes the certification document in at least two ways:

-   -   The recipient may be an end-user (153), e.g., procurement expert        for a customer reviewing compliance for global raw material        acquisitions;    -   The recipient may be another ERP system (253) automatically        connected in a business-to-business network in which the        transaction data and documents are passed from the supplier's        ERP directly into the recipient's ERP system.

The end-user can open the document in a number of different formats(Adobe PDF, Microsoft Word, HTML, etc.), and click on the validationhyperlink within the document itself (401). The user may invoke (ordepending on the customization of the validation hyperlink) a number ofdifferent services from the validation hyperlink:

-   -   The source document may be returned opened at the relevant page,        optionally with a relevant section highlighted (107, 207);    -   The validation service may provide a report of all amendments or        modifications to the cited document since the date of the        document's generation relative to the timestamp of the        validation phrase code (109, 209);    -   The validation service may return a summary report of other        regulations within the topical context of the document as        defined in a profile (108, 208);    -   The validation service may return alerts, news of proposals        relevant to the certification, or other transaction control        information (110, 210).

In addition, if the document above contained a statement, such as “Rawmaterials used in processing comply with FDA requirements, according tosupplier certification”, the source document would be opened as above,but only to an authorized user.

The supplier of the Polyethylene certification has cited a 2003 datedCFR in the above example. An obvious question for the recipient iswhether FDA has promulgated any changes to the citation from the timethat the document was created. Two related issues arise: a) Was thesupplier correct and current in citing the FDA approval; and b) have anychanges occurred since? One aspect of the present invention providesvalidating documents for a quicker and more effective answer to thesequestions.

Further, the customer may wish to trans-ship the received polyethylene[as the example in this case] to another country not included in thelist of certifications. At this point, although the supplier may nothave, in some cases, disclosed all information necessary for aconclusive answer by the customer, the customer may wish to make anindependent evaluation based upon the information provided for a numberof reasons. The customer's intended market or use may be perceived asconfidential information that the customer may be unwilling to discloseto the supplier.

As a result a rich set of validation services is provided on the basisof the system of validation of the invention, and these services areavailable to both supplier and recipient, as well as downstream in thesupply chain.

Document Searching, Indexing, and Extraction of References to Materialsand Material Classes: An aspect of the invention provides for supportingthe validation system is its method to search and index documents inorder to extract references to materials and material classes. Theinvention provides methods for:

-   -   Indexing, searching, and extracting direct references,        identifiers, synonyms and multi-lingual references to        “materials” from a document (e.g., 608, 703, 704); and    -   Indexing, searching, and extracting multi-lingual references to        “material classes” from a document (e.g., 609, 701, 702).

A material in the database has a common identification even if it may bereferenced by many other identifiers as used in regulations ordocuments. Although prior art has defined any number of different typesof databases of substances, the present invention is unique in severalrespects:

-   -   The database has a superset and unique concept of material that        cuts across and relates individual occurrences to it;    -   The database structure links together proper names, synonyms,        translations, identifiers, and literal names (i.e., alphanumeric        sequences used in documents that refer to a material but may be        erroneous or have ancillary alphanumeric characters associated        with them), allowing a reference in a document, which may be        entirely erroneous or in a different language, to be related        back to both a proper synonym as well as to a larger concept of        material; and,    -   The database structure links all of these references together to        the associated documents.

A material class is a superset containing one or more materials. Anexample of a class defined in many environmental, safety, and healthregulations is “Chrome VI” compounds, which defines a particularmembership of chromium compounds and includes sodium chromate. In orderfor the user of a material to meet applicable requirements, he or shemust be aware not only of direct references to the substance but alsoindirect references, which apply through parent-child relationships. Forexample, sodium chromate is a “child” of the “parent” class, “Chrome VIcompounds”. In many cases, because the legal definition of theregulation's scope—or more precisely, the document's definitionalcontext—the use in question may not be a scientific relationship but anarbitrarily defined one. As noted above, an automaker may define a setof materials that it has chosen not to purchase, as a matter of policy.Or, a document may refer to a particular list of salts, but not allsalts of an acid.

Although prior art includes many uses of parent-child relationships indatabases and to parent-child relationships of substances to groups, thepresent invention is unique in that:

-   -   Materials and material class references are related to their        occurrence within documents;    -   Materials and material classes are defined within the scope of a        document or regulation;    -   Materials can themselves be supersets of substances;    -   Materials and material classes are structurally linked to        multi-lingual occurrences and to literal name occurrences.

One embodiment of the present invention is a method and data processingsystem to cross-index references to chemicals and materials in documentsnot only by a direct reference to a material but also by one or more ofthe following: synonym, identifier, translation, or material class ofwhich the chemical is a member.

An illustration from prior Art of the need for the embodiment of thepresent invention to provide a method and data processing system tocross-index references to chemicals and materials in documents isillustrated in FIG. 12, which provides results from three searches fromGoogle Scholar 1202, found on the worldwide web at scholar.google.com/.The first example 1203 is a search for the chemical, “crotonic acid”that returns two thousand five hundred and twenty document references(2,520) 1204. The second example is a search for a synonym of thechemical, “(E)-2-Butenoic acid” 1205 that returns twenty documentreferences (22) 1206 that are not consistent with the references foundin the first search. The third example is a search for a translation ofthe chemical, “Crotonzuur” 1207 that returns no search results from theGoogle Scholar index or search engine 1208. Similar results would occurfor a chemical class of which crotonic acid is a member, such as“Ungesättigte aliphatische Mono- and Dicarbonsäuren C3-C8” or “Ácidos”.The approach to indexing chemical terms by prior Art embodied by othersuppliers provides similar inconsistencies: for example, the searchabove would provide similar inconsistent results but for a differentdocument index library if the same three searches were performed on thepublicly accessible demonstration of Illumina, found on the worldwideweb at csa.com/.

One embodiment of present invention provides all matching availabledocument references to the entered search term, synonyms of the searchterm, identifiers of the search term, translations of the search term orclasses of which the chemical or material is a member can be returnedfrom a search for a chemical or material term (see FIG. 7).

Referring now to FIGS. 5A, 5B, and 5C, an embodiment of the presentinvention is illustrated, providing the steps to index chemical entriesin a document that includes a reference to at least one material, sothat the data processing system can permit a search for a given chemicalin order to return from a web server not only a direct reference butalso a compilation of references to documents containing the chemicalterm including: (a) documents containing a synonymous reference to thechemical term, (b) documents containing a translated reference to thechemical term; (c) documents containing a reference to an identifierassociated with the chemical term; (d) documents containing a referenceto a parent class of which the chemical term is a member.

Referring now to FIG. 5C, in a method 501 at a step 502, the document isinput, preferably in electronic format, such as HTML, Adobe PDF,Microsoft Excel, Microsoft Word, or in computer readable form such asXML or other document exchange format and preferably input into adocument database 530. The software of the present invention extractsall of the words or alphanumeric strings in the document in a step 503and tests that it is a word and not simply a sequence of control codesor other information. In one embodiment of the present invention, afound word is compared to a dictionary of common words in order tosuppress words that are not chemical terms, which has the effect ofdecreasing the size of the database and the speed of the indexing. Oncethe words are found in the document, a record in the index database isstored. In a preferred embodiment of the present invention the word isstored, including the address of the document whether a filename orhyperlink, the page on which the word is found, and the position of theword on the page. One method for defining the position of the term onthe page is to define a rectangular area that encompasses the area ofthe word on the page, and to store the coordinates of the rectangle inthe index record.

In a step 504 the relevance of the extracted words or alphanumericstrings in the document as stored in the index is determined bycomparing, in sequence and in combination, the extracted words oralphanumeric strings to a dictionary database of material terms. Thematerial terms dictionary database contains at least one of thefollowing: a unique material identification code, a preferred name ofthe instance of the material, and a name used for matching against adocument term (Matchname). An extracted word or alphanumeric string iscompared to an index of Matchnames from the database of material terms,such as by retrieving the extracted word or alphanumeric string from adocument index, discussed in more detail below. If a match occurs in astep 505, an index record is prepared for at least one of the following:(a) found chemical, (b) document, (c) document address, (d) materialcode, after which the record is stored in the materials index in a step506. As a result of this embodiment of the indexing system, a particularchemical name is indexed against any entry in a plurality of documentsin the document database.

Referring to FIG. 5B the sub-step 504 is shown in greater detail. Afterthe alphanumeric strings have been extracted in the step 503, the firstalphanumeric string is extracted in a step 535. In a step 536, thealphanumeric string is prepared for matching, such as by convertinglower case characters to upper case, stripping punctuation, convertingGreek letters to alphanumeric equivalents or the like. In a step 537,the alphanumeric string is compared with a dictionary database ofmaterial terms. In a step 538, if a match occurs, the method proceeds tothe step 539 and the matched term is added to a stored set of matchingmaterial terms, after which the combined matching terms are combined ina step 541 and the method proceeds to the step 544 to test if there aremore strings in the document. If there are no more strings, the sub-step504 ends and the method 501 proceeds to the step 505. If there are morestrings, the method returns to the step 535 to begin a new determinationof relevance.

If no match occurs in the step 538, the method tests, in a step 540 ifthe stored material terms exist. If the stored material term does exist,the method proceeds to a step 542 where a stored matching material termis retrieved, which is accepted as a material reference and the methodwrites an index record in a step 543 after which the method proceeds tothe step 544 to test if there are more strings in the document. If thereare no more strings, the sub-step 504 ends and the method 501 proceedsto the step 505. If there are more strings, the method returns to thestep 535 to begin a new determination of relevance. If no storedmaterial terms exist in the step 540, the method proceeds to the step544 to test if there are more strings in the document. If there are nomore strings, the sub-step 504 ends and the method 501 proceeds to thestep 505. If there are more strings, the method returns to the step 535to begin a new determination of relevance.

Referring now to FIG. 5A, the materials database or dictionary databaseof common material terms is illustrated at 515 as including one or moreof the following: (a) A unique material identification code or key 516;(b) a proper chemical name 517; (c) synonyms 518; (d) identifiers, e.g.,an EU Reference Number, an EINECS number, etc. 519; (e) translations ofchemical names 520; (f) members of parent classes, e.g., sodium chromateis a member of the class hexavalent chromium compounds; and othermaterial attributes.

The document database is illustrated at 530 as including one or more ofthe following: (a) A unique material identification code or key 531,which is related to 516; (b) a document address or filename using theMatchname of the material; (c) a page number for the occurrence of theMatchname 533; and (d) the location on the page of the term 534.

A master materials database is shown at 523 and defines the relationshipwith a master “material” that encompasses one or more entries in thematerials database 515. A master material record has one or more of thefollowing elements: (a) a unique master materials code or key 524; (b) aproper master name 525; (c) master identifiers associated with themaster material 526; (d) attributes associated with the master material527; (e) an identification that this is a parent class.

After the method 501 is complete, a cross-index is generated byretrieving a matched alphanumeric string from the step 505, matching theretrieved alphanumeric string with a matching master materials databasekey 524 and cross-indexing the matching records in the document database530, the materials database 515, and the master materials database 523.In this embodiment of the present invention, the resulting cross-indexpermits a particular occurrence of a chemical term to be related notonly to all indexed documents in the document database 530 containingthe term, but to any synonym, translation, identifier or membership in abroad chemical class in the materials database 515 or the mastermaterials database 523.

Referring now to FIG. 6, the results of the indexing of chemical termsin documents is illustrated in an example of an embodiment of thepresent invention. An example master material key (516 in FIG. 5A) isillustrated at 601. If there is an associated reference to a mastermaterial key (524 in FIG. 5A) that is a parent class the key entry isrepresented in 602. For example 608, in this instance the preferred namefor the material entry is (E)-2-butanoic acid 603, indicating here asthe “head_name” with a material identifier code 304 (601). One instanceof a material linked to this index entry is the name, “crotonic acid”604. This entry is found in the file, 21cfr176.180.pdf (607). Anattribute of this entry is a citation, “21 CFR 176.180 Paper/paperboard(dry food)”.

In another instance 609, the material is a member of a chemical class“Ungesättigte aliphatische Mono- and Dicarbonsäuren C3-C8”, and thisclass is referenced in a document with the citation “BfR Empf 35 [XXXV.]Mischpolymerisate: Ethylen, Propylen, Butylen, Vinylestern,ungesättigten aliphat.” with the filename, de_(—)350deutsch.pdf.(E)-2-butanoic acid is a member of the referenced class of chemicals.

FIG. 8 is an illustration of an example of a fragment from a document,21 CFR §178.2010, with a sequence of words at 810, beginning “3.1Mineral oil (CAS Reg. No. 8012-95-1): Not to exceed 40 percent by weightof the stabilizer formulation”. In applying the method 501 of thepresent invention for indexing chemical terms to this portion of thedocument, the chemical term of interest in this fragment is “Mineraloil”, which is embedded in a sequence of other alphanumeric charactersand other non-alphanumeric punctuation, bit streams, and control codes.The first term of this sequence is extracted as indicated in steps 503and 506. This alphanumeric sequence, “3.1”, does not match any term inthe materials database and is discarded.

One difficulty with existing indexing methods that is addressed by thepresent invention is that a chemical term may span many separated words901 as illustrated in FIG. 9. Extracting only the first word length termin sequence, i.e., “Mineral” would not correctly represent the term inthis example extract. One advantageous aspect of this embodiment of thepresent invention is that the comparative indexing method in the step ofdetermining relevance 506 heuristically evaluates a series ofpossibilities as it processes the document by considering a plurality ofterms in sequence. In one embodiment of this method, the terms in thematerial database 515 are taken in order of the longest alphanumericsequence first.

In an illustration of this approach, the indexing utility extracts theword, “Mineral” in the step 503 and stores this term temporarily untilthe indexing method 501 can conclude that the best match has been foundfor the series, rejecting or accepting terms as the processingcontinues. For instance, “Mineral” might possibly be followed by“reinforced nylon resins”, “Mineral oils and hydrocarbons”, “Mineral oilbased greases”, or the like as shown at 901 although it does not in thedocument example 801. At this step the indexing method 501 has found achemical term with a plurality of matches, and then seeks to narrow thepossible matches by taking the second term, “oil” 801 following theterm, “Mineral”. “Mineral oil” in combination eliminates all otheravailable terms in this example, and the method determines that “Mineraloil” with the internal identification material identification key (516from FIG. 5A) of “1476” (902 in FIG. 9) matches this series of words inthe document in the step 509. This match is then stored in the step 510.

The indexing method 501 then moves in sequence or sequentially toconsider the following term, “(CAS” and the subsequent term “No”. 803that are discarded without matches. In a preferred embodiment of thismethod, a database of common (i.e. nonmaterial) words may beadditionally used to make the comparison more efficient by eliminatingany common word, such as “the”, “No.”, “not”, “to”, “exceed”. A commonword in such a list would be ignored by the indexing method 501. Inconsequence, if the database of common words is used in the heuristiccomparison step 508, the words of the phrase “Not to exceed” would bediscarded as “common words” or “noise” without comparison to thematerial database.

In a further aspect of this embodiment of the present invention, the CASRegistry Number, “8012-95-1” 803, is an example of an identifier for thesubstance, “mineral oil” that can be itself extracted from the documentby the indexing method 501 of the present invention and used inassociation with other material identification keys that are themselveslinked to other document references.

In an embodiment of the indexing method 501, the words are pre-processedto strip punctuation and capitalize terms so that the comparison stepcan consider a number of equivalent chemical terms more easily. Takingan example, such as “(+) 1,6-Di-(4-amidinophenoxy)-n-hexan”, whether thechemical appears with the use of parentheses or bracket characters doesnot matter, because the indexing method would treat the punctuation asnoise. Similarly the embodiment would ignore case sensitivity, such asinitial capitalization in a document reference.

Additionally, certain alphanumeric sequences taken together areconsidered as equivalent in the preferred embodiment of the indexingmethod. For example “Alphamuurolen”, “α-Muurolene”, and “α-Muuolene” areequivalent, permitting the “Alpha”, “a-”, and “α-” alphanumericcharacter sequences to be indexed as companion terms. Such terms can bein any position within the word, such as“2-Metossi-4-(2-propenil)fenil-β-D-glucopiranoside” which should betreated synonymously to“2-Metossi-4-(2-propenil)fenil-beta-D-glucopiranoside”. These variantsare assigned to the matching material key 516 or 531.

In the previous example, the indexing method 501 found the term,“Mineral oil” in the text of the document and its associated referenceto the material identification key “1476” 902. Once this identificationhas occurred, the material identification key 516 or 531 and thus thedocument reference may be related to all other references associatedwith the master material key 524. The master material key 524 mayinclude direct references, synonyms, translations, or chemical ormaterial classes. As an example of a material class, the material,“mineral oil” with its material identification key can be considered amember of the class, “cottonseed and other edible oils”. Thus the mastermaterial identification key 524 would include a reference to this class.

The indexing method 501 of this embodiment of the present invention canbe used to support a search for “mineral oil” that will return areference to any available document that uses the term “cottonseed andother edible oils”. In this example, the benefit of this cross-indexingis that the user would be able to rapidly determine that “mineral oil”is permitted as a surface lubricant by FDA in the production of resinousand polymeric coatings used in food contact uses under 21 CFR §175.300.

The method 501 for indexing and searching chemical terms hassignificant, beneficial application that assist in improving the safeand globally compliant use of product, for instance. The method 501 ofthe present invention permits a cross-referencing of documents by: (a)preferred chemical name, (b) identifier; (c) translation; (d) synonym;and (e) membership in a parent class.

Referring now to FIG. 7, a data processing system is illustrated as apreferred embodiment of the present invention that returns in responseto a search based on the index described above for a particular materialname, “crotonic acid”, found references to documents that include areference to a synonym 703, one or more references to parent classes701, 702, or a translated name 703. In this illustration of a preferredembodiment, the hyperlink to the document opens at the page of thedocument on which the particular reference is found.

Another aspect of this embodiment of the present invention is that theindexing method can be applied to a subset, such as those documents thathave changed over a particular period of time.

Referring now to FIG. 10, an embodiment of the present invention isillustrated in which the body of documents selected for search are “new”or “changed”. The subset of documents subject to indexing and search maybe assembled by either manual or automatic means. For example, manualresearch may have identified amendments to a certain topical area ofregulations over a period of time, or a subsequent studies that havebeen published, or other subsequent publications during the givenperiod. Automatic means may select a document subset as well.

In consequence, the present embodiment supports a search for a chemicalthat is affected by a change, whether the document that has changedrefers directly to the chemical term, uses an identifier, a translation,or refers to a chemical class encompassing the term.

Referring now to FIG. 11, a search for the term “acetone” 1101 returns areference to a Mercosaur Agreement defining a positive list ofsubstances permitted for use in plastics manufactured in South Americancountries subject to the Mercosaur Agreement. This document is a memberof a subset of documents added or changed during a given period of time.The benefit of the present embodiment are that a researcher can moreeffectively keep abreast of changes that affect his or her use ofmaterials that are referenced in newly published regulations ordocuments through the present data processing system. At present, theresearcher must perform this task manually or through partiallyautomated searches.

Referring now to FIGS. 10A, 10B, and 10C, an embodiment of the presentinvention is illustrated providing the steps to index chemical entriesin a new, subset, or changed document that includes a reference to atleast one material, so that the data processing system can permit asearch for a given chemical in order to return from a web server notonly a direct reference but also a compilation of references todocuments containing the chemical term including: (a) documentscontaining a synonymous reference to the chemical term; (b) documentscontaining a translated reference to the chemical term; (c) documentscontaining a reference to an identifier associated with the chemicalterm; (d) documents containing a reference to a parent class of whichthe chemical term is a member.

Referring now to FIG. 10C, in a method 1001 at a step 1002, the documentis input,

preferably in electronic format, such as HTML, Adobe PDF, MicrosoftExcel, Microsoft Word, or in computer readable form such as XML or otherdocument exchange format and preferably input into a document database1030. The software of the present invention extracts all of the words oralphanumeric strings in the document in a step 1003 and tests that it isa word and not simply a sequence of control codes or other information.In one embodiment of the present invention, a found word is compared toa dictionary of common words in order to suppress words that are notchemical terms, which has the effect of decreasing the size of thedatabase and the speed of the indexing. Once the words are found in thedocument, a record in the index database is stored. In a preferredembodiment of the present invention the word is stored, including theaddress of the document whether a filename or hyperlink, the page onwhich the word is found, and the position of the word on the page. Onemethod for defining the position of the term on the page is to define arectangular area that encompasses the area of the word on the page, andto store the coordinates of the rectangle in the index record.

In a step 1004 the relevance of the extracted words or alphanumericstrings in the document as stored in the index is determined bycomparing, in sequence and in combination, the extracted words oralphanumeric strings to a dictionary database of material terms. Thematerial terms dictionary database contains at least one of thefollowing: a unique material identification code, a preferred name ofthe instance of the material, and a name used for matching against adocument term (Matchname). An extracted word or alphanumeric string iscompared to an index of Matchnames from the database of material terms,such as by retrieving the extracted word or alphanumeric string from adocument index, discussed in more detail below. If a match occurs in astep 1005, an index record is prepared for at least one of thefollowing: (a) found chemical, (b) document, (c) document address, (d)material code (in this case for the subset of documents), after whichthe record is stored in the materials index in a step 1006. As a resultof this embodiment of the indexing system, a particular chemical name isindexed against any entry in a plurality of documents in the documentdatabase.

Referring to FIG. 10B the sub-step 1004 is shown in greater detail.After the alphanumeric strings have been extracted in the step 1003, thefirst alphanumeric string from the document subset is extracted in astep 1035. In a step 1036, the alphanumeric string is prepared formatching, such as by converting lower case characters to upper,stripping punctuation, converting Greek letters to alphanumericequivalents or the like. In a step 1037, the alphanumeric string iscompared with a dictionary database of material terms. In a step 1038,if a match occurs, the method proceeds to the step 1039 and the matchedterm is added to a stored set of matching material terms, after whichthe combined matching terms are combined in a step 1041 and the methodproceeds to the step 1044 to test if there are more strings in thedocument. If there are no more strings, the sub-step 1004 ends and themethod 1001 proceeds to the step 1005. If there are more strings, themethod returns to the step 1035 to begin a new determination ofrelevance.

If no match occurs in the step 1038, the method tests, in a step 1040,if the stored material terms exist. If the stored material term doesexist, the method proceeds to a step 1042 where a stored matchingmaterial term is retrieved, which is accepted as a material referenceand the method writes an index record in a step 1043 after which themethod proceeds to the step 1044 to test if there are more strings inthe document. If there are no more strings, the sub-step 1004 ends andthe method 1001 proceeds to the step 1005. If there are more strings,the method returns to the step 1035 to begin a new determination ofrelevance. If no stored material terms exist in the step 1040, themethod proceeds to the step 1044 to test if there are more strings inthe document. If there are no more strings, the sub-step 1004 ends andthe method 1001 proceeds to the step 1005. If there are more strings,the method returns to the step 1035 to begin a new determination ofrelevance.

Referring now to FIG. 10A, the materials database or dictionary databaseof common material terms according to one embodiment of the presentinvention is illustrated at 1015 as including one or more of thefollowing: (a) A unique material identification code or key 1016; (b) aproper chemical name 1017; (c) synonyms 1018; (d) identifiers, e.g., anEU Reference Number, an EINECS number, etc. 1019; (e) translations ofchemical names 1020; (f) members of parent classes, e.g., sodiumchromate is a member of the class hexavalent chromium compounds; andother material attributes.

The change document database is illustrated at 1030 as including one ormore of the following: (a) A unique material identification code or key1031, which is related to 1016; (b) a document address or filename usingthe Matchname of the material; (c) a page number for the occurrence ofthe Matchname 1033; and (d) the location on the page of the term 1034.

A master materials database is shown at 1023 and defines therelationship with a master “material” that encompasses one or moreentries in the materials database 1015. A master material record has oneor more of the following elements: (a) a unique master materials code orkey 1024; (b) a proper master name 1025; (c) master identifiersassociated with the master material 1026; (d) attributes associated withthe master material 1027; (e) an identification that this is a parentclass.

After the method 1001 is complete, a cross-index is generated byretrieving a matched alphanumeric string from the step 1005, matchingthe retrieved alphanumeric string with a matching master materialsdatabase key 1024 and cross-indexing the matching records in thedocument database 1030, the materials database 1015, and the mastermaterials database 1023. In this illustration of the present invention,the resulting cross-index permits a particular occurrence of a chemicalterm to be related not only to all indexed documents in the documentdatabase 1030 containing the term, but to any synonym, translation,identifier or membership in a broad chemical class in the materialsdatabase 1015 or the master materials database 1023.

Embodiments of the present invention may also be practiced by a computerreadable medium storing executable software code thereon for executingthe indexing method and the system for validating certificationdocuments in accordance with the present invention, as will beappreciated by those skilled in the art Embodiments of the presentinvention may also be practiced by a device, such as a personal computeror the like, having a processor, wherein the processor is responsive tosoftware instructions; and software instructions adapted to enable theprocessor to execute the indexing method and the system for validatingcertification documents in accordance with the present invention, aswill be appreciated by those skilled in the art.

Embodiments of the present invention produce an advantageous technicaleffect by providing a data processing system that more effectivelycommunicates parameters of use of products in a supply chain byutilizing dynamic control statements in certification documents that maybe validated by a recipient of the certification document, therebyimproving the safe and compliant use of the products in the supply chainby the recipient. The present invention thereby provides a furthertechnical effect which lends technical character to the embodiedcomputer programs in the control of an industrial process and inprocessing data that represents parameters of use of physical entitiesthrough the dynamic control statements. In consequence, the presentinvention provides a solution to a difficult supply chain managementproblem by the automatic validation of certification documents.

Embodiments of the present invention have been described in terms ofpreferred embodiments and nonlimiting examples, however, it will beappreciated that various modifications and improvements may be made tothe described embodiments and examples without departing from the scopeof the invention.

1. A method for indexing documents in a data processing system, saiddocuments including a reference to at least one material, comprising theacts of: inputting a document into said data processing system;extracting at least one alphanumeric string from said document;determining relevant alphanumeric strings from said extractedalphanumeric strings by processing said extracted alphanumeric stringsutilizing at least one algorithm by comparing, in sequence and incombination, said extracted alphanumeric strings with material terms inat least a dictionary database of common material terms; matching saidrelevant alphanumeric strings with materials alphanumeric strings storedin said data processing system; and storing said matched alphanumericstrings in respective matched records in said data processing system. 2.The method of claim 1 wherein said document is an updated document andfurther comprising: comparing said matched alphanumeric strings withsaid stored documents; and storing changed alphanumeric strings in saiddata processing system.
 3. The method of claim 1 wherein said materialterms include at least one of a proper name, an identifier, a synonym, atranslation, class membership, a product, or combinations thereof. 4.The method of claim 1 wherein said step of storing said matchedalphanumeric strings includes storing a location in said document ofsaid matched alphanumeric string.
 5. The method of claim 1 wherein saidat least one material is a one of a chemical substance, a biologicalagent, a formulation, and a finished article.
 6. The method of claim 1wherein said at least one algorithm compares said extracted alphanumericstring with material terms and, when a match is found, combines saidextracted alphanumeric string with a subsequent alphanumeric string insaid document and compares said combined alphanumeric string withmaterial terms, said algorithm repeating to compare and combine saidalphanumeric strings until no match with a material term is found andthereby matching said combined alphanumeric strings with said material.7. A method for indexing documents in a data processing system, thedocuments including a reference of at least one of a chemical substanceand a biological agent, comprising the acts of: inputting at least onedocument into the data processing system; extracting at least onealphanumeric string from the at least one document; determining relevantchemical and biological strings from the extracted alphanumeric stringsby processing the extracted alphanumeric strings using at least onealgorithm which compares in sequence and in combination the extractedalphanumeric strings with at least one of chemical terms and biologicalterms in at least a dictionary database of at least one of commonchemical terms and common biological terms; matching the relevantchemical and biological strings identified in the determining act withat least one of chemical strings and biological strings stored in thedata processing system; and storing the matched chemical and biologicalstrings in matched records in said data processing system associatedwith the at least one document.